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ABSTRACT 

The  noncentral  chi-squared  distribution  with  zero  degrees  of  freedom  is 
defined  as  a Poisson  mixture  of  mass  at  zero  together  with  chi-squared 
distributions  that  have  even  degrees  of  freedom.  Their  name  Is  justified 
by  the  decomposition  of  the  classical  noncentral  chi-squared  distribution  as 
the  sum  of  a central  chi -squared  component  having  the  full  number  of  degrees 
of  freedom  and  an  independent  noncentral  chi-squared  component  having  zero 
degrees  of  freedom.  The  basic  properties  of  this  one-parameter  family  of 
distributions  are  given,  and  they  are  shown  to  be  useful  in  the  computation 
of  approximate  critical  values  of  a test  for  uniformity. 


AMS  (MOS)  Subject  Classifications:  Primary  60E05,  62E10,  62E20,  62F05 

Secondary  62M15,  62Q05 

Key  Words:  Compound  Poisson  distribution,  Goodness-of-fit  testing. 
Testing  for  periodicity 

Work  Unit  Number  4 (Probabil ity.  Statistics,  and  Combinatorics) 


* 

Department  of  Statistics  and  Mathematics  Research  Center,  University  of 
Wisconsin  at  Madison. 


Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-75-L-0.Q24.and 
by  the  National  Science  Foundation  under  Grant  No.  MCS75-17385  A01 . 


SIGNIFICANCE  AND  EXPLANATION 


\ 


The  chi-squared  family  of  statistical  tests  and  probability  distribu- 
tions is  the  basis  for  many  tests  of  significance  and  goodness-of-fit  in 
statistics.  This  paper  reports  the  discovery  of  a new  distribution  from 
this  family:  the  noncentral  chi-squared  distribution  with  zero  degrees  of 
freedom.  This  distribution  cannot  be  defined  in  the  conventional  way,  which 
explains  why  it  was  unnoticed  until  now.  However,  it  can  be  properly  defined 
in  another  way,  and  it  leads  to  the  previously  impossible  decomposition  of 
the  classical  noncentral  chi-squared  distribution  into  two  parts:  a 
completely  central  component  with  all  of  the  degrees  of  freedom,  and  a 
completely  noncentral  component  with  no  degrees  of  freedom. 

The  distribution  is  also  useful  in  its  own  right  in  connection  with 
testing  the  hypothesis  that  given  observations  all  between  0 

and  1 are  independently  chosen  from  the  uniform  distribution  in  (0,1). 

An  application  is  outlined  in  conjunction  with  the  improvement  upon 
Sir  R.  A.  Fisher's  test  for  periodicity  in  a time  series  reported  in  MRC 
Technical  Summary  Report  #1843.  The  noncentral  chi-squared  distribution 
with  zero  degrees  of  freedom  provides  much  better  approximate  critical 
values,  necessary  for  the  use  of  this  test,  than  does  the  usual  Normal  or 
Gaussian  distribution  approximation.  v 


THE  NONCENTRAL  CHI-SQUARED  DISTRIBUTION  WITH  ZERO  DEGREES 
OF  FREEDOM  AND  TESTING  FOR  UNIFORMITY 

★ 

Andrew  F.  Siegel 

1.  The  Noncentral  Chi -Squared  Distribution  with  Zero  Degrees  of  Freedom: 
Definition  and  Basic  Properties 

The  central  and  noncentral  chi-squared  distributions  are  fundamental 
tools  in  many  areas  of  theoretical  and  applied  statistics  (Lancaster  (1969)). 
In  this  paper,  attention  is  focused  on  an  unexplored  group  of 
distributions  from  this  family:  those  with  zero  degrees  of  freedom.  Their 
definition  ajid  basic  properties  are  given  in  this  section,  and  an  example 
of  their  use  in  testing  for  uniformity  is  given  in  Section  Z.  It  is 
expected  that  many  additional  applications  will  be  found  in  the  future. 

There  are  several  reasons  why  the  case  of  zero  degrees  of  freedom 
has  been  overlooked  since  Fisher,  in  1928,  first  derived  the  noncentral 
chi-squared  distribution.  First,  it  does  not  possess  a probability 
density  function  because  of  a discrete  mass  point  at  zero.  Second,  it  cannot 

be  defined  simply  as  the  sum  of  independent  squared  normal  deviates  with 
variance  one.  And  third,  the  central  chi -squared  distribution  with  zero 
degrees  of  freedom  is  identically  zero,  wrongly  suggesting  that  the 
general  case  of  zero  degrees  of  freedom  would  be  trivial . 

The  noncentral  chi-squared ‘distribution,  Xq(*)»  with  zero  degrees 
of  freedom  and  noncentrality  parameter  A>0  is  most  directly  approached 
as  a compound  Poisson  mixture  of  central  chi-squared  distributions  with 
even  degrees  of  freedom.  This  extends  the  standard  representation  (for 
example  on  page  132  of  Johnson  and  Kotz,  (1970))  to  the  case  of  zero 
degrees  of  freedom.  We  define 

* 
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to  be  the  result  of  the  two-stage  process  in  which  we  first  choose  K 
from  a Poisson  distribution  with  mean  X/2  so  that 

P(  K=k ) = e'X/2(X/2)k/k! , k = 0,1,2,... 

and  then  choose 


When  K = 0,  we  adopt  the  convention  that  the  (central)  Xq  distribution 
is  identically  zero;  this  accounts  for  the  discrete  component  of  the 

Xg(X)  distribution.  Thus  XqU)  ’s  d mixture  of  the  distributions  0, 
x2’  x4’  x5 • • • • with  eights  exp(-X/2),  exp(-X/2)(X/2) , exp(-X/2)(X/2)2/2, 
exp(-X/2)(x/2)J/6, . . . . 

The  basic  properties  of  this  distribution  can  be  derived 
directly  from  the  compound  Poisson  representation.  The  character- 
istic function,  reproductivity  properties,  moments,  cumulants, 
asymptotic  behavior,  cumulative  distribution  function,  and 
density  (to  the  extent  that  one  exists)  will  be  exhibited  in 
the  remainder  of  this  section. 

The  characteristic  function  of  Y^  - is 

$^(t)  = E exp(itY^)  = exp{ itX ( l-2it )-1 ) } (1.1) 

which  is  obtained  from  the  Poisson  mixture  of  the  characteristic 
functions  (l-?it)  of  the  Xjx  distributions.  This  is  the  same 
formula  obtained  by  substituting  zero  fox'  the  degrees  of  freedom 
in  the  characteristic  function  of  the  classical  noncentral  x* 

distribution. 
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The  reproductive  properties  of  this  distribution  follow  immediately 
from  its  characteristic  function  (1.1).  Let  denote  the  convolution 

operator,  so  that  F*G  denotes  the  distribution  of  X*Y  where  X and  V are 
independent  random  variables  chosen  from  the  distributions  i and  G 
respectively.  Then  we  have 


*0<»X 


x;(v) 


x6*V*xn^V  “ xn*VV 


( 1 .:) 


(1.3) 


(1.3)  is  of  particular  interest  because  it  allows  us  to  decompose  the 
X*(A)  distribution  into  a complete  central  oart  with  the  full  n degrees 
of  freedom  and  a noncentral  part  without  any  degrees  of  freedom,  lhus 
x'Xp(*)  can  be  decomposed  as  X=Y*Z  where  Y'Xq(x)  and  Z'\^  are  independent. 
Going  as  far  back  as  p.  669  of  Fisher  (1928),  the  \^(\)  is  traditionally 
decomposed  as  a convolution  of  xj(i)  anc*  \ • often  (as  in  Theorem  1.1 
on  page  117  of  Lancaster,  1969)  by  representing  it  as  the  sum  of  n 
independent  squared  normal  deviates  with  variance  one  and  using  a rotation 
in  n-space  to  bring  the  mean  vector  to  the  first  coordinate  axis.  This 
confounding  of  one  degree  of  freedom  with  the  noncentral  1 ty  is  no  longer 
necessary;  a complete  separation  of  noncentral ity  from  all  degrees  of 
freedom  is  now  possib<e. 

The  cumulnnts  of  » Xp(x>  are  seen  from  (1.1)  to  be 

k = \2m~lm!.  The  moments  can  be  found  directly  from  a Poisson 
m 

mixture  of  the  moments  of  the  component  central  \2  distributions, 


The  moments  are 
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m , , 

EyJ*  = 2mm!  L (^”})(X/2)k/k!  . 
k = l 

Moments  and  cumulants  of  low  order  are 


m moment 

1 X 

2 X2+4X 

3 X3+12X2+24X 

4 X4*24A3*144X?*192X 


central 

moment 

0 

4 X 
24X 

192X448X: 


(1.5) 


cumulant 

0 

4 X (1.6) 

24  X 
1921 


Asymptotic  normality  of  Y^ 

More  precisely. 


Xq(X)  holds 


when  k is  large. 


Y -X  D 

-*  N(0,1)  as  X • (1.7) 

2/X 


which  is  quickly  proven  using  the  characteristic  function  (1.1). 

Asymptotically  when  X is  small,  the  positive  component  of 
Yx  - X q ( x ) tends  to  Xj*  Noting  that  most  of  the  mass  is  at 
zero  in  this  case,  we  decompose  Y^  into  the  mixture 


exp( -X/2) 
l-exp( -X/2 ) 


(1.8) 


so  that  ZA  is  the  conditional  random  variable  YA|{YA>0},  which 
is  positive  and  continuous.  In  fact,  ZA  is  the  mixture  of 
x2,*4’*6”,‘  mixing  probabilities  pr(K-k|K>0)  where  K - P0(X/2). 

Then  using  the  decomposition  (1.8)  and  the  characteristic  function 
(1.1)  one  can  show  that 


D 

ZA  -*  Xj  dS  X •*  0 . 


(1.9) 


The  cumulative  distribution  function  of  is 


when  t>0  and  is  zero  otherwise.  These  series  converge  quickly 


hence  this  formula  is  convenient  for  computing.  Figure  1 shows 
the  cumulative  distribution  function  ( t ) of  for  various 

values  of  X.  Clearly  apparent  are  the  discontinuities  at  t=0 
(due  to  the  mass  exp(-A/2)  at  zero),  asymptotic  normality  when 
X is  large,  and  asymptotic  exponentiality  (xp  of  the  positive 
component  when  X is  small. 


The  density  pf  the  x*qU)  distribution,  properly  speaking,  does 
not  exist  due  to  the  mass  at  zero.  However,  the  positive  part  of  this 
distribution  does  have  a “density"  f^(t)  in  the  sense  that  if  Y^'x'qU) 
and  0<a<b,  then 


Mixing  the  densities  of  the  non-degenerate  component  central  \ 


we  find  that 


(Xt/4) 


which  can  also  be  expressed  as 


where  I,  denotes  the  first  modified  Bessel  function.  Thi 


not  a true  density  because  its  total  mass  is  onlv 


a? 

P(0<Y,<«)  = Jf,(t)dt  = 1 - exp(-A/2)<l  . 

A Q A 

If  we  normalize  f and  define 

gx(t)  - fA(t)/(l -exp(-X/2))  (1.1*) 

then  9x(t)  is  a true  density.  It  is  the  density  of  = Y\^YX>0*’ 
the  positive  continuous  random  variable  defined  in  (1.8). 

Graphs  of  these  "densities"  f^(t)  are  shown  in  Figures  2 and  3. 

Figure  2 shows  the  case  X<2,  and  we  see  clearly  that  they  are  not  true 
densities  because  the  areas  under  the  curves  are  not  equal.  This  is 
because  as  A increases,  the  mass  exp(-A/2)  at  zero  decreases  and  is 
moved  to  the  right  (to  the  positive  continuous  part)  increasing  the 
area  (l-exp(-A/2))  under  these  curves.  Again  we  note  the  exponential 
(x?2 ) Yorm  these  curves  when  A is  small,  as  was  shown  in  (1.9). 

The  ordinate  intercept  f^(0)  takes  its  maximum  value  of  l/(2e)  at 
A=2,  and  Figure  3 shows  some  "densities"  f (t)  when  A>2  and  this  intercept 

A * 

is  decreasing  in  A.  When  A=1 0,  only  a mass  of  .00674  remains  at  zero, 
and  we  begin  to  see  the  trend  towards  asymptotic  normality  for  large  A 
predicted  by  (1.7). 


2.  Testing  for  Uniformity 

The  purpose  of  this  section  is  to  show  low  the  noncentral  chi -squared 
distribution  with  zero  degrees  of  freedom  can  be  used  in  the  approximation 
of  critical  values  for  a test  of  uniformity. 
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Many  problems  involving  testing  a hypothesis  can  be  reduced  to  the 
following  situation:  given  data  Xj,...,X  j all  between  0 and  1,  we 
wish  to  test  the  null  hypothesis  that  the  X,.  were  independently  chosen 
from  the  uniform  distribution  on  the  interval  (0,1).  That  is,  test 

iid 

Nq:  ^n-1  V — 1 — '*^(0.1). 


Order  the  data  and  adjoin  the  endpoints  to  obtain  0 = X(q)<)((1)<‘ 

Then  define  the  spacings  Yj  = X(j)"X(j  1)’  ^ = ^ n- 

We  will  consider  tests  based  on  statistics  of  the  form 


’ <X ( n-1 )<X(n) 


T(n,a)  = l (Y.-a) 
j=l  J 


(2.1  ) 


where  0<a<l  and  (t)+  = max(t,0)  is  the  positive-part  function.  I have 
shown  these  statistics  to  be  useful  in  testing  for  periodicity  in  a time 
series  (Siegel  1979b).  They  are  sensitive  to  the  existence  of  large  spacings. 


'They  are  adaptive  and  continuous,  for  they  select  only  the  largest 
spacings  (those  with  Y^>a)  and  sum  the  excess  of  each  such  above  the 
threshold  value  a. 

Fisher's  (1929)  test  for  periodicity  can  be  obtained  as  a special 

case  of  (2.1)  when  a = a*  is  chosen  so  that 

n 


P(T(n,a*)>0)  = P(max  Y^>a*)  = a (2.2) 

J 

where  a is  the  desired  level  of  the  test  and  a*  depends  implicitely  on  a. 
Note  that  if  a>a*  then  T(n,a)  has  mass  greater  than  1-a  at  zero 
and  randomization  will  be  necessary  to  insure  level  a.  Thus  the 
nonrandomized  level  a tests  based  on  statistics  of  the  form  (2.1 ) come 


from  a member  of  the  one-parameter  family  of  statistics  T(n,r,a*)  where 
0<r,<l.  t,  = 1 yields  Fisher's  test,  while  c = 0 yields  the  useless 
statistic  T(n,0)  = 1.  A power  study  in  Siegel  (1979b)  showed  that  the 
choice  c = .6  yielded  a good  overall  test  with  significant  power  gains 
over  Fisher's  test  against  certain  alternatives. 

The  null  distribution  of  this  statistic  was  found  to  be 

P(T(n.ca*)>t)  - l Y(-l)kH+1(?)(l"1)(n‘1)tk(1-aa*-t)J‘k"1  (2.3) 

n £=]  |<=0  IRK  " 

and  critical  values  for  n up  to  50  were  tabled.  For  large  n,  the  terms 
with  alternating  signs  can  be  quite  large,  leading  to  a serious 
problem  with  round-off  error  during  computation.  This  is  why  we  seek  the 
asymptotic  distribution  of  this  statistic. 

There  are  two  candidates  for  the  asymptotic  distribution  of  T(n,z.a*): 
the  normal  distribution  and  the  x2q(M  distribution.  This  follows  from 

Theorems  3.2  and  4.1  of  Siegel  (1979a)  because  the  distribution  of  V(n,a) 
of  that  paper  is  identical  to  the  distribution  of  T(n,a)  here.  Theorem 
4.1  showed  that  T(n,t,a*)  actually  is  asymptotically  normal  for  fixed 
C as  n«».  However,  for  even  moderately  large  n,  T ( n , <,a * ) can  still 
place  significant  mass  at  zero,  and  the  normal  approximation  may  not  be 
very  good.  Theorem  3.2  allowed  c to  depend  on  n so  that  the  mass  at 
zero  was  preserved  in  the  limit,  and  the  X2q(*)  distribution  was  obtained. 

Each  of  these  distributions(normal  and  x2qU))  yields  an  approximate 
critical  value  for  T(n,ca*),  obtained  by  matching  up  the  first  two  moments. 
The  first  two  moments  of  T(n,f.a*)  are  the  same  as  those  calculated 
for  D(n,ta*)  in  Siegel  (1978).  Thus 
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ET(n,<,a*)  * (l-;a*)n 


(2.4) 


E[T(n,WJ)]J  ■ SiTtZU-CaJ)"”  * (n-1)(l-2ca*)"+,3  . (2.S) 

The  normal  approximation  is  based  on  the  critical  values  of  a normal 
distribution  with  these  first  two  moments.  The  \7 Q(X)  approximation  will 
be  based  on  the  critical  values  of  cx2q(X)  where  the  scale  factor  c 
and  the  noncentrality  parameter  X are  chosen  so  that  cx2g(x)  has  (2.4) 
and  (2..S)  as  its  first  two  moments.  Using  (1.6)  the  solution  is 


2(l-4a;)n+1+(n-l)(l-2Ca*)"+1-(n+l)(l-ca*)2n 

C “ n 


4(n+l  )(l-j;a*)n 


(2.6) 


X - (1-ca‘f/c  . 


(2.7) 


Critical  values  for  T(n,t,a*)  with  the  preferred  choice  c « .6 
were  calculated  exactly  using  (2.3)  and  approximately  using  the  normal 
and  the  x2g(x)  approximations.  The  results  are  listed  in  Table  1 for 
levels  a = .05  and  .01,  and  for  n e 10  through  50.  Comparing  these, 
we  see  that  the  x2q(x)  approximation  is  clearly  superior  to  the  normal 
approximation.  Note  that  the  critical  values  from  the  x2q(x)  approximation 
are  very  close  to  the  actual  critical  values,  even  for  relatively  small 
values  of  n. 

The  two  columns  on  the  right-hand  side  of  Table  1 show  that  the 
differences  between  the  two  approximations  are  significant. 

When  n = 50  the  normal  approximation  with  nominal  level  a = .05  has 
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actual  level  a = .0774,  and  the  normal  approximation  with  nominal  level 
a = .01  has  actual  level  a = .0439.  The  actual  levels  obtained  using 
the  X2gU)  approximation  are  much  closer  to  the  nominal  levels:  they 
are  a = .0509  and  a = .00985  respectively. 

The  reason  why  the  normal  approximation  fails  here  is  because  n * 50 
is  not  yet  large  enough  for  those  asymptotics  to  be  appropriate.  The 
amount  of  mass  that  T(n,.6a*)  places  at  zero  will  diminish  to  zero  in 
the  limit,  but  at  level  a = .01  with  n = 50,  there  is  still  a mass  of 
.674  at  zero'!  The  x2q U)  distribution  is  not  affected  by  this  problem 
because  It  is,  like  T(n,^a*)  Itself,  a mixture  of  mass  at  zero  with 
positive  continuous  variation. 

The  clear  recommendation  is  thus  to  use  cx2q(A),  where  c and  X are 
found  from  (2.6)  and  (2.7),  as  an  approximation  to  the  null  distribution 
of  T(n,ca*).  This  will  still  be  able  to  handle  the  ultimate  asymptotic 
normality  of  T(n,^a*)  because,  by  (1.7),  X2g(*)  is  also  asymptotically 

normal  as  A-*». 


Table  i.  A comparison  of  the  exact  critical  values  of 
T(n,.6a*)  with  the  approximations  calculated  using  the  normal 

distribution  and  the  x*q(*)  distribution 

Critical  Values  Actual  Level 

n ‘**an  Normal  x 0^  Exact  Normal  x 

10  .267  .151  .178  .181  .0816  .0529 

20  .162  .0971  .114  .116  .0795  .0519 

a = .05  30  .119  .0742  .0872  .0880  .0785  .0514 

40  .0944  .0611  .0715  .0721  .0778  .0511 

50  .0788  .0524  .0612  .0616  .0774  .0509 


10  .322  .128  .217  .214  .0467  .00951 

20  .198  .0782  .134  .134  .0450  .00979 
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